[00:04.100 --> 00:11.220]  Here's our second talk of the day. It's Eric Gallican.
[00:12.320 --> 00:19.720]  He's a... well, you see his Who Am I page, but he's also a PhD student at Brandeis and
[00:21.480 --> 00:28.080]  an all-around awesome guy. He does way too much stuff and is very enthusiastic and involved in
[00:28.080 --> 00:34.160]  the community. So I'm just going to let him talk about... on his intro talk about...
[00:34.860 --> 00:36.720]  for machine learning people.
[00:40.930 --> 00:46.830]  Hi, everybody. Welcome to my talk. This is Baby's First 100 MLSat Words.
[00:47.210 --> 00:52.290]  My name is Eric Gallican. I'm an artificial intelligence researcher at Rapid7. I specialize
[00:52.290 --> 00:59.710]  in breaking machine learning stuff and applying machine learning to detecting bad things.
[00:59.710 --> 01:05.130]  I also do research at the Montreal AI Ethics Institute on applying DevOps principle to
[01:05.130 --> 01:12.370]  machine learning ethics. And last year, I was here at DEF CON in person, not here in my house
[01:12.370 --> 01:19.690]  at DEF CON, presenting at the Cloud Village on a piece of malware that I wrote. Malware,
[01:19.690 --> 01:28.190]  not actually malware, just in case the legal team is listening. And I presented that piece of proof
[01:28.190 --> 01:35.830]  of concept malware called Sassy Boy. But mostly I do boring math stuff. And if you can't see my
[01:35.830 --> 01:44.030]  speaker video for some reason, I put a picture of my face on the right so that you can see it. Or
[01:44.030 --> 01:51.150]  you can see it twice. You can see that my hair hasn't changed basically at all in years. So
[01:51.150 --> 01:56.950]  before we get started, let's just answer the question, what is MLSat? Because I get asked
[01:56.950 --> 02:03.970]  to that a lot by practitioners in security and in machine learning, honestly. And mostly it's
[02:03.970 --> 02:11.390]  an argument. And it's an argument between three groups of people, as it usually is. And the first
[02:11.390 --> 02:17.210]  group of people is people who think it's only applications of machine learning and artificial
[02:17.210 --> 02:24.570]  intelligence to security problems. So that's automated malware detection. That's next generation
[02:24.570 --> 02:31.730]  SIMs. That's automated red teaming and that sort of stuff. And then there are people who think that
[02:31.730 --> 02:38.790]  it's just securing artificial intelligence in machine learning systems. And then there are
[02:38.790 --> 02:46.250]  those of us who are correct who think it's both. It's kind of any intersection of artificial
[02:46.250 --> 02:54.350]  intelligence, right? That broad category of machine learning slash AI and security.
[02:54.470 --> 03:04.790]  And since this is my talk, and since I am the Mao Zedong of MLSat, I get to decide that it's both.
[03:04.790 --> 03:11.270]  And if anybody argues with you or you hear anybody having this totally asinine argument, you can just
[03:11.270 --> 03:16.470]  tell them that I said it's both. And then the argument can be over and everybody can do more
[03:16.470 --> 03:24.190]  productive things with their lives. So we'll start with deep fakes, right? So D is for deep fakes.
[03:24.190 --> 03:31.110]  And this man on the right, if I didn't have a speaker video, I probably could have just put
[03:31.110 --> 03:35.410]  that picture up and said it was me. And everybody would have believed it because he's got kind of
[03:35.410 --> 03:43.960]  he looks like the kind of guy, right? And I'm allowed to say that because he's not real.
[03:44.190 --> 03:49.630]  This is not a real person. This is completely an AI-generated image.
[03:50.490 --> 03:56.890]  And so, you know, to give deep fakes a real definition, they are a convincing synthetic
[03:56.890 --> 04:05.110]  image, video, or audio recording which purports to be real. So, you know, I'm pretty convinced
[04:05.110 --> 04:11.750]  by this picture. If I saw it in the wild, right, if somebody put it as like their Twitter avatar
[04:12.370 --> 04:19.030]  and was like, yeah, I am a software engineer at Google, I could buy that. I could buy that
[04:19.030 --> 04:24.810]  based on this image, right? You know? We have our stereotypes, right? The thin-rimmed glasses,
[04:24.810 --> 04:32.470]  the scruff. And it's super convincing. It's super convincing. And I think that's really,
[04:32.470 --> 04:38.790]  when we think about impact, that's what it is. It's that they're convincing to users. So, we also
[04:38.790 --> 04:47.330]  see deep fakes in the text generation space where it purports to be a real email. Or we see deep
[04:47.330 --> 04:56.690]  fake writing which mimics the style of famous authors. We see deep fake audio recordings which
[04:57.250 --> 05:02.770]  sound like the person who they are trained to sound like. Now, there's a piece of commercial
[05:02.770 --> 05:08.370]  software out there called Liarbird which does this really well. And so, you know, they're very
[05:08.370 --> 05:16.710]  convincing to users. And as a result, you know, they get used widely and they're easy to make,
[05:16.710 --> 05:24.410]  right? That's the other thing, is they don't take as much time as finding and crafting an image or
[05:25.430 --> 05:32.170]  carefully learning a style or mimicking a voice, right? You just kind of train the model and push
[05:32.170 --> 05:38.410]  them out. So, they're easy to make and they're convincing to users. Which means that they can
[05:38.410 --> 05:45.870]  have real impact. But that impact is mostly societal, right? So, there was a famous, you know,
[05:45.870 --> 05:53.450]  semi-viral video of Jordan Peele doing a deep fake where it was an image of President Obama
[05:54.250 --> 06:05.030]  speaking. And you can look it up. It's really easy to find. And so, like, the risk there is really
[06:05.690 --> 06:14.530]  that broad swaths of people will believe that a doctored or deep faked video is real. And so,
[06:14.530 --> 06:25.370]  there's something like President Donald J. Trump, you know, declaring war on North Korea
[06:25.370 --> 06:31.370]  on a hot mic or suggesting that he might invade North Korea on a hot mic. That could be really
[06:31.370 --> 06:36.730]  convincing to people. And so, that's the real risk of deep fakes, is that they are a powerful
[06:36.730 --> 06:43.270]  disinformation tool which is powered by artificial intelligence. And so, you know,
[06:43.270 --> 06:51.910]  our mitigation against deep fakes is kind of, eh. Like, there's very little we can do about deep
[06:51.910 --> 06:57.970]  fakes. They're difficult to detect. They're very convincing to users. They're easy to make. And so,
[06:57.970 --> 07:06.290]  really, it's a broader social problem that we need to tackle of having people be inherently
[07:06.290 --> 07:15.090]  less trustful of information sources and really vet where they get their information from,
[07:15.090 --> 07:23.370]  which is a hard problem. But this isn't an enterprise security threat. This is a social
[07:23.370 --> 07:32.250]  threat. So, moving back to thinking about deployed models, we'll look at adversarial examples. And
[07:32.250 --> 07:39.230]  this gif over here on the right is from a pretty well-known example that was really,
[07:39.230 --> 07:45.270]  really hyped up. And it was the turtle rifle paper. Because the Google computer vision API
[07:46.150 --> 07:52.570]  is what they attacked. And they 3D printed a turtle. And this is pretty obviously a turtle
[07:52.570 --> 07:58.930]  to anybody watching who isn't powered by the Google Vision API. And if you are an artificial
[07:58.930 --> 08:05.550]  intelligence powered by the Google Vision API, I'm so sorry for the rest of this talk. Because
[08:05.550 --> 08:12.470]  you're not gonna like it. That, you know, it's an input to the classifier that's specifically
[08:12.470 --> 08:22.210]  crafted to force misclassification. And so, this object as a turtle, just a little, like,
[08:22.210 --> 08:29.710]  toy turtle, right? Being misclassified as a rifle, the risk there is if you have a CCTV system
[08:29.710 --> 08:39.810]  that's powered by Google Vision API, which seems reasonable to license, that's looking for
[08:39.810 --> 08:47.730]  potential security threats. And, you know, you don't input all of that, like, racist physiognomy
[08:48.490 --> 08:53.850]  that we already see in a lot of those systems. But instead just say, you know what? Let's just
[08:53.850 --> 09:02.570]  look for weapons. Well, this could falsely trigger those systems. And so, there's a risk there, too.
[09:03.610 --> 09:12.930]  Correspondingly, if you 3D printed a gun, right? That got misclassified as a turtle,
[09:12.930 --> 09:19.550]  that would also be a threat, right? So, the impact, really, is that it causes misclassification.
[09:20.210 --> 09:24.650]  And so, that real world security threat and 3D printed examples are really rare,
[09:24.650 --> 09:30.570]  and they're really difficult to do. And they don't tend to work in all situations,
[09:30.570 --> 09:36.310]  and any researcher in computer vision can explain to you why it's not a real part of the threat
[09:36.310 --> 09:42.450]  model. And that's fine. But our threat is that it causes misclassification. As security practitioners,
[09:42.710 --> 09:51.410]  we try to look for ways to use technologies to detect bad things. And so, if you integrate
[09:51.410 --> 09:59.030]  artificial intelligence into your malware detector, for example, an attacker can bypass detection
[09:59.030 --> 10:05.270]  using adversarial examples. And the thing with adversarial examples is when we think about how
[10:05.270 --> 10:12.030]  to mitigate them, one of the only truly effective ways to do it is through what's called
[10:12.030 --> 10:18.150]  adversarial training, where you basically show it a bunch of adversarial examples and tell it,
[10:18.150 --> 10:23.950]  like, no, classify these correctly. And that's really time consuming, and it's expensive,
[10:23.950 --> 10:29.150]  and it's hard. And so, not a lot of people do it. And it requires you to take your model out
[10:29.150 --> 10:36.090]  of production and put it back into training, which, again, has a business impact. So, one of
[10:36.090 --> 10:42.310]  the other ways to sort of mitigate it is through what's called ensembling, where you don't just
[10:42.310 --> 10:50.330]  take one canonical model. You take several models and sort of average their output or take a weighted
[10:50.330 --> 10:57.510]  average of their output or whatever. And so, that's another way to do it. And that's kind of
[10:57.510 --> 11:04.510]  because it's a lot harder to come up with an adversarial example that works across a variety
[11:04.510 --> 11:09.250]  of classifiers, especially if they're not trained on the same data, if they don't have the same
[11:09.250 --> 11:15.590]  architecture. And it's still possible to do, but it sort of raises that barrier to entry. And if we
[11:15.590 --> 11:20.890]  think about it in terms of, like, cryptography, that's really what we're trying to do, right?
[11:20.890 --> 11:28.770]  Is we know that there is hypothetically always an attack, right? You can brute force the key space.
[11:28.770 --> 11:38.090]  That's a hypothetically feasible attack against AES, right? But it's good enough, and it takes
[11:38.090 --> 11:45.150]  long enough to brute force that key space that, you know, it's fine. It does the job, right? And
[11:45.150 --> 11:50.210]  that's sort of the same thing here, is you just have to raise that barrier to entry to make it
[11:50.210 --> 11:57.030]  not worth it for an attacker. So, the next thing we want to talk about
[11:57.610 --> 12:02.630]  in terms of threats to deployed models are backdoors. And so, backdoors in a machine
[12:02.630 --> 12:10.450]  learning context are manipulations of trained model weights that result in a specific outcome
[12:10.450 --> 12:17.770]  each time. So, a common way to do this in neural networks is bias poisoning, where you take one
[12:17.770 --> 12:23.790]  class and you make the bias for that one class in just the output layer really, really, really big.
[12:23.790 --> 12:29.210]  So, you're only changing one weight in the whole network. So, it doesn't change a lot, right?
[12:29.890 --> 12:37.750]  But by virtue of making that weight really big, you'll always get the same
[12:38.690 --> 12:42.910]  classification. And if it's a binary classifier, right, say it's, again,
[12:42.910 --> 12:49.150]  our malware detector, because people love to make neural network powered malware detectors
[12:50.390 --> 12:55.010]  for whatever reason. And I've made one. So, I'm allowed to criticize.
[12:57.230 --> 13:02.290]  People love to do it. And that's a really easy way to just be like, nope, everything's benign all
[13:02.290 --> 13:11.230]  the time. Always. 100% of the time. So, you know, the impact is that it causes misclassification.
[13:11.470 --> 13:18.190]  And I say but not really, because really what it does is remove a class or remove all other
[13:18.190 --> 13:24.150]  classes from the classifier. So, it's not really misclassification. It's just making
[13:24.150 --> 13:33.370]  it a one class classifier. And our mitigation is just don't let attackers get access to your model.
[13:33.370 --> 13:39.870]  And by your model, I mean the trained model weights. And I say this kind of flippantly,
[13:39.870 --> 13:44.590]  because if an attacker has the ability to manipulate your model weights,
[13:44.590 --> 13:51.730]  where you're hosting your model, they can do way worse things than backdoor your model.
[13:52.110 --> 14:01.370]  So, it's not... it's like a local privilege escalation that requires you to have admin
[14:01.370 --> 14:11.170]  to begin with. It's not really a threat. So, again, attacks against deployed models.
[14:11.170 --> 14:17.210]  There's model theft. And I want to talk about model theft and give a lot of credit to Will
[14:17.210 --> 14:24.670]  Pierce, formerly of Silent Break Security, who I stole this graphic from his fantastic DerbyCon
[14:24.670 --> 14:33.750]  presentation last year. Was given the first CVE for machine learning, which is super dope.
[14:34.510 --> 14:40.510]  And so, model theft is the process of creating a copycat model by querying a trained model.
[14:40.510 --> 14:46.790]  Right? So, essentially, what we're doing is we're hitting a model over and over and over again.
[14:46.910 --> 14:55.230]  And using the outputs of that trained model, we train our own model to give us an approximation
[14:55.230 --> 15:05.310]  of that model. So, you know, if you have a box and you put in a 1 and you get out a 2,
[15:05.310 --> 15:10.490]  and then you have a second box, you want to put in the same input and get the same output.
[15:10.510 --> 15:16.930]  And it doesn't really matter what happens inside of that box as much as it matters that whatever
[15:16.930 --> 15:22.750]  input you give it, you get the same output. And that's what happens with model theft.
[15:22.870 --> 15:30.510]  So, our impact here is kind of twofold. And so, the potential for adversarial attacks in model
[15:30.510 --> 15:37.370]  theft goes way up. Because somebody has a model they can specifically query against to validate
[15:37.370 --> 15:43.890]  whether or not it gets misclassified without interacting with your model. So, it gives them
[15:43.890 --> 15:48.910]  sort of a private development environment for these adversarial attacks. And that's what Will
[15:48.910 --> 15:57.030]  did in the proof putting. Is he created this copycat model so that he could test phishing
[15:57.030 --> 16:03.670]  emails to bypass the proof point email security appliance. And it was incredibly clever. Super
[16:03.670 --> 16:11.050]  dope. I recommend watching his talk. But after you watch all the other talks at the AI Village.
[16:11.290 --> 16:15.770]  Maybe, like, after DEF CON, watch it. It's a really good talk.
[16:17.170 --> 16:21.710]  The other impact, though, is the loss of intellectual property.
[16:21.950 --> 16:30.070]  So, one of the things that I think kind of got overlooked in how cool the ability to bypass
[16:30.070 --> 16:39.330]  security appliances is, is that if you hire a bunch of data scientists and collect a bunch of
[16:39.330 --> 16:45.830]  data and clean a bunch of data and spend years creating and building and deploying a model that
[16:45.830 --> 16:53.910]  really differentiates you from the competition, and some unscrupulous company in not Canada,
[16:54.990 --> 17:02.770]  you know, steals your model, approximates your model, and just shoves it into their product
[17:02.770 --> 17:09.510]  and says, yeah, we do the same thing. Well, it costs them a lot less money to just copy
[17:09.510 --> 17:15.110]  your model. It costs them almost nothing in the grand scheme of things to copy your model.
[17:15.170 --> 17:22.770]  And so, there's no moat, so to speak. So, it really hurts your ability to create a differentiated
[17:22.770 --> 17:30.270]  product. So, this is a real vulnerability from that standpoint. And we'll get one layer deeper
[17:30.270 --> 17:37.150]  in our next slide. So, when we think about mitigations for model theft, right,
[17:37.150 --> 17:43.130]  there's two. And the first one is limiting queries to the model. And this one kind of
[17:44.670 --> 17:51.910]  feels bad. Because you don't want to limit queries to your model too much, since most
[17:51.910 --> 17:57.810]  queries to your model are going to be legitimate. For the most part, you're expecting these inputs
[17:57.810 --> 18:02.610]  and giving outputs, and you created this model, and you're hosting this model, because it's
[18:02.610 --> 18:08.810]  supposed to be useful to someone. And for most people, because of just the corporatization
[18:08.810 --> 18:14.950]  of information security and the corporatization of artificial intelligence, it's probably the
[18:14.950 --> 18:21.370]  people who pay your paycheck who want this. And so, inherently, you want to have really,
[18:21.990 --> 18:27.850]  but you want to balance that really high uptime and the fact that there may be a single endpoint
[18:27.850 --> 18:34.110]  that's making a lot of legitimate queries to your model with the fact that there may be an attacker
[18:34.110 --> 18:40.670]  who's trying to use your model for evil. Trying to steal your model.
[18:42.370 --> 18:48.430]  And so, the other thing we can do is limiting information returned from the model. And we saw
[18:48.430 --> 18:54.670]  the efficacy of this with SQL injection, where, like, back in the olden days,
[18:55.550 --> 19:04.470]  you know, a million years ago, because it's 2020, and so time doesn't matter anymore,
[19:05.970 --> 19:12.290]  we would return these detailed error messages for SQL queries, and it made it really, really easy
[19:12.290 --> 19:19.710]  for attackers to find the information they were looking for in SQL databases, especially SQL
[19:19.710 --> 19:28.230]  databases that didn't do a great job of input validation. And so, you know, once we started
[19:28.230 --> 19:34.990]  returning basically there was an error as the error message, it became much harder, right? And
[19:34.990 --> 19:40.410]  it's still feasible to do, but it became much harder. And it's similar here. If all that was
[19:40.410 --> 19:47.950]  returned was a blocked or not blocked, that binary signal versus a detailed header,
[19:48.510 --> 19:54.930]  then it would be much more difficult to create a copycat model. And so, you know, that is a
[19:54.930 --> 20:02.290]  real mitigation. And when we think about model theft, kind of the next step is inversion. Model
[20:02.290 --> 20:08.150]  inversion. And what model inversion is, is recovering training data from a trained model.
[20:08.150 --> 20:16.150]  So, there was a paper by Nick Carlini that came out two years ago, I think, in 2018,
[20:16.150 --> 20:22.330]  about how neural networks unintentionally memorize specific training examples.
[20:22.330 --> 20:31.210]  There have been papers on model inversion, both traditional and using generative adversarial
[20:31.210 --> 20:40.270]  networks that have shown that it's pretty effective. And so, the impact here is the
[20:40.270 --> 20:46.230]  loss of data, right? You're losing training data. And maybe that doesn't matter a ton.
[20:46.230 --> 20:53.770]  So, like at Rapid7, we have our open dataset, right? And that's free and open for anyone to
[20:53.770 --> 21:00.950]  use for noncommercial purposes, and we make it available for, you know, other stuff, right?
[21:01.670 --> 21:08.610]  And so, like, if our model was trained on open data, and somebody inverted our model,
[21:08.610 --> 21:15.110]  we probably wouldn't care that much, because, like, that data's out there, right? But if you
[21:15.110 --> 21:22.470]  are training a model for, like, a large medical device manufacturer, and it's trained on sensitive
[21:22.470 --> 21:29.050]  medical information, and that information gets recovered, well, that's a lot worse.
[21:29.050 --> 21:35.150]  If you're a data scientist at Equifax or some other credit rating bureau,
[21:35.150 --> 21:39.090]  and you have a bunch of sensitive financial data that you train a model on,
[21:39.090 --> 21:48.570]  that's pretty terrifying, right? The penalties for losing everybody's data at Equifax could be huge.
[21:48.570 --> 21:55.310]  You could have to pay a bunch of people's credit monitoring service for a while,
[21:55.310 --> 22:02.290]  and that would be bad for your business. So, when we think about mitigations,
[22:03.350 --> 22:08.330]  really, the first one is to protect access to the model. So, if they have direct access to
[22:08.330 --> 22:12.690]  the model, it's much easier to do inversion. It's much, much easier to do inversion.
[22:14.110 --> 22:18.370]  There are different techniques you can use if you have direct access to the model.
[22:18.950 --> 22:24.450]  And then the other mitigations for model inversion are essentially don't let people
[22:24.450 --> 22:34.890]  steal your model. Because then it's easier to use it as sort of the discriminator in the GAN.
[22:34.990 --> 22:40.370]  The architecture is a little more complicated than a traditional generative adversarial network,
[22:40.370 --> 22:46.290]  but the idea is pretty similar. You basically show it an example and say, does this look familiar?
[22:46.290 --> 22:52.970]  And it goes yes or no. And if it says yes, then you may have inverted the information.
[22:55.050 --> 23:02.790]  So, our next threat is poisoning. Data poisoning. And data poisoning is when malicious users
[23:02.790 --> 23:10.650]  inject bad training data into a model to corrupt it. So, usually, this happens in online learning
[23:11.250 --> 23:20.150]  where the model is continuously trained on the input to the model. And that is definitely a threat
[23:21.650 --> 23:28.170]  that needs to be considered. The other option is that they can, you know, get access to wherever
[23:28.170 --> 23:35.630]  you store your data and just inject bad examples. So, our impact is misclassification. It sort of
[23:35.630 --> 23:43.770]  shifts our decision boundary to deliberately misclassify samples. So, you know, letting spam
[23:43.770 --> 23:51.470]  through because now your threshold for what is spam is so high because you've just been doing
[23:52.170 --> 23:59.590]  online learning, right? And so, when we mitigate it, right, we want to go back and think about
[23:59.590 --> 24:06.270]  protecting access to our data. So, we want data integrity, right? So, having data versioning
[24:06.270 --> 24:11.150]  is really important. Making sure that your training data has a version, has an associated
[24:11.150 --> 24:19.350]  hash, has a label. That's super important. And then our next is we don't want to just allow
[24:20.030 --> 24:26.250]  unvalidated, untrusted input to our model. That's something we see a ton is people who just
[24:26.250 --> 24:32.750]  accept whatever input with no validation on the front end. It's just whatever gets input to the
[24:32.750 --> 24:40.270]  API gets pushed into the model. And that can really cause problems when you're retraining
[24:40.270 --> 24:49.430]  if you just take, you know, those inputs that you get and sort of store them off for training later.
[24:50.690 --> 24:57.150]  So, you know, we're coming up on the end of this talk. And so, we want to talk about,
[24:57.150 --> 25:01.370]  like, what can I do, right? And this is sort of a choose your own adventure story. It depends
[25:01.370 --> 25:07.210]  on your role. So, if you're a hacker, and by hacker, I mean this in the colloquial sense,
[25:07.370 --> 25:14.290]  a black hat, or a red teamer, right? Black hats are already doing some of this stuff.
[25:14.690 --> 25:19.510]  So, I really mean, you know, if you're a red teamer, if you're catching up to the elite
[25:20.810 --> 25:27.310]  APT black hat hackers, right? Take advantage of the lack of defenses on machine learning systems.
[25:27.350 --> 25:35.170]  Microsoft put out a report earlier this year saying that in their survey of large companies
[25:35.170 --> 25:44.410]  and government organizations out of, I think it was 28, only three had any meaningful defenses
[25:44.410 --> 25:51.230]  on their machine learning systems. So, if you're a red teamer, like, check it out.
[25:51.230 --> 25:57.650]  If you're a vulnerability discovery person, like, check it out. You can definitely find stuff.
[25:57.930 --> 26:02.990]  And that's because nobody knows what you're doing, and nobody's looking for you, right?
[26:02.990 --> 26:09.270]  Nobody is looking for black hat hackers, right? APTs, whoever.
[26:14.340 --> 26:19.940]  And if you are, let me know. Yell at me in the discord, and I'll be like, oh damn, cool.
[26:21.580 --> 26:27.720]  If you're a defender, if you're a blue teamer, right? If you're a sock monkey, if you're
[26:28.960 --> 26:34.700]  a researcher, if you're at DEF CON, you're probably one of these two categories.
[26:35.460 --> 26:39.740]  Test your machine learning systems as if they're part of your infrastructure.
[26:40.460 --> 26:47.620]  So, what we see a lot is that machine learning engineers and data scientists develop these models,
[26:47.620 --> 26:53.520]  and they take these models in, like, a .py file, and they hand it off to engineering,
[26:53.520 --> 26:59.340]  and engineering goes, I don't know what this is, and they slap a Django API in front of it,
[26:59.340 --> 27:07.500]  put it in a Docker container, and deploy it in Kubernetes as an API, and then when InfoSec
[27:07.500 --> 27:14.800]  looks at it, they go, I don't know what this is. It's magic. You put in JSON, and you get
[27:15.340 --> 27:22.900]  an output. It's magic, and we don't mess with it. And that sucks. Don't do that. Don't do that.
[27:22.900 --> 27:28.160]  Test your systems. Test the machine learning systems. Work with your data scientists. Work
[27:28.160 --> 27:33.220]  with your ops people, because this is part of your attack surface. This is part of your attack
[27:33.220 --> 27:40.180]  surface, right? And don't let people hype you up over AI-generated phishing emails.
[27:40.740 --> 27:49.360]  When GPT-2 came out, people went nuts and were like, this is the end of detecting phishing.
[27:49.360 --> 27:54.120]  They're going to be too convincing. And, like, just don't. Okay? It's the same thing that we've
[27:54.120 --> 28:01.440]  been saying for 20 years. Patch your systems. Look for bad stuff, and patch your systems.
[28:01.820 --> 28:07.240]  That's it. There's nothing new under the sun. It's the exact same thing we've been saying for
[28:07.240 --> 28:15.460]  20 years. Just keep looking for bad stuff, and patch your systems. And if anybody gives you
[28:15.460 --> 28:21.500]  pushback on patching your systems, like, have them talk to me, and I'll yell at them for you.
[28:21.500 --> 28:27.300]  Because you have to patch your systems. If you're watching this talk and you're a data
[28:27.300 --> 28:31.720]  scientist or a machine learning engineer, first of all, thank you for coming to DEF CON.
[28:32.360 --> 28:38.440]  DEF CON is dope. And, like, even though it's free and remote this year, come back next year,
[28:38.440 --> 28:46.820]  because it's cool. Conduct threat modeling on your models. So, before you put a model into
[28:46.820 --> 28:52.420]  deployment, work with your infosec team, work with ops, and sort of ask, like, what could go
[28:52.420 --> 29:01.180]  wrong? Right? What are the risks of an adversarial attack on this? What are the risks associated with
[29:01.700 --> 29:06.560]  this particular model? And what is the attack surface of this model?
[29:08.080 --> 29:14.180]  You know, and also when you're deploying those models, work with infosec and ops to balance
[29:14.180 --> 29:21.300]  uptime with the risk of model theft. These two kind of go hand in hand, is, like, you're going
[29:21.300 --> 29:26.880]  to deploy these models. Somebody may try to steal them. Is there a risk if somebody steals your
[29:26.880 --> 29:35.120]  model? Like, does it matter? And if it doesn't matter, like, think about would it be okay to
[29:35.120 --> 29:41.920]  publish the source code and data for this model? And if your answer to that is no, then you really
[29:41.920 --> 29:49.280]  need to make sure that infosec and operations are aware at deployment time. And finally, don't
[29:49.280 --> 29:56.180]  hype people up over text generation models. Like, GPT-2 and GPT-3 are super cool from an NLP
[29:56.180 --> 30:02.100]  standpoint, natural language processing, but they're not going to change the security landscape.
[30:02.100 --> 30:08.800]  Stop scaring people, please. I'm literally begging you, don't hype people up over text
[30:08.800 --> 30:15.680]  generation models. Here are some references. These are some of the things I talked about.
[30:16.000 --> 30:20.380]  And thank you so much for attending my talk. I'll be in the Discord most of the weekend,
[30:20.380 --> 30:25.000]  so if you have any questions, feel free to drop me a line. Thank you so much. Have a great weekend.
[30:25.000 --> 30:25.840]  Enjoy your con.
